R/RStudio and Quarto

MKTG 585R Quantitative Marketing Pre-PhD Seminar

R/RStudio

Orientation

  • Console: Where you run code and perform calculations.
  • Source: Create and save R scripts and send code to the Console.
  • Environment: A snapshot of what you have loaded.
  • Git: Where you can commit, pull, push, and change branches.
  • Help: Look up documentation, including “run examples” and cheatsheets.
  • Files: The working directory for your project.

RStudio Tools > Global Options

  • General > Save workspace to .RData on exit: Never.
  • Code > Editing > Execution > Ctrl + Enter executes: Multiple consecutive R lines.
  • Code > Editing > Use native pipe operator.
  • Code > Display > General > Highlight selected line.
  • Code > Display > General > Rainbow parentheses.
  • Code > Saving > Automatically save when editor loses focus.
  • Appearance.

Functions

We import, wrangle, visualize, and model data using functions. Functions are composed of arguments that tell the function how to operate.

  • Using a function is referred to as a “call” or a “function call.”
  • Don’t forget you have Help.

The tidyverse

The tidyverse is a collection of R packages that share common philosophies and are designed to work together. An R package is a collection of functions, documentation, and sometimes data.

install.packages("tidyverse") # Only install once.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.2     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Just Once: Clone Repo

Create a new project using Version Control > Git.

Safe File Paths

Using a .Rproj file helps us keep track of the working directory as well as safely deal with file paths. File paths can be tricky because they are operating system and user-specific. We can install the here package and use here() in conjunction with an RStudio project to write file paths for us.

here::here()
[1] "/Users/msb-mdotson23/Box Sync/Marketing Analytics and Pre-PhD Seminar Materials/MKTG 585R Quantitative Marketing Pre-PhD Seminar/seminar-materials"

Import CSV Files

Whenever we need a file path, we can use here() and build up from the working directory. For example, if customer_data.csv is in the Data folder in your working directory:

customer_data <- read_csv(here::here("Data", "customer_data.csv"))
Rows: 10531 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): gender, married, college_degree, region, state, review_time, review...
dbl (5): customer_id, birth_year, income, credit, star_rating

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
  • Note that customer_data now appears in our Environment.
  • The Environment lists any objects (i.e., data, variables, or custom functions) we’ve assigned a name using <-.
  • We can also view objects in their own tab.

Data Frames

customer_data
# A tibble: 10,531 × 13
   customer_id birth_year gender income credit married college_degree region   
         <dbl>      <dbl> <chr>   <dbl>  <dbl> <chr>   <chr>          <chr>    
 1        1001       1971 Female  73000   742. No      No             South    
 2        1002       1970 Female  31000   749. Yes     No             West     
 3        1003       1988 Male    35000   542. No      No             South    
 4        1004       1984 Other   64000   574. Yes     Yes            Midwest  
 5        1005       1987 Male    58000   644. No      Yes            West     
 6        1006       1994 Male   164000   554. Yes     Yes            Midwest  
 7        1007       1968 Male    39000   608. No      No             Midwest  
 8        1008       1994 Male    69000   710. No      No             South    
 9        1009       1958 Male   233000   702. No      No             West     
10        1010       1994 Female  77000   605. Yes     No             Northeast
# ℹ 10,521 more rows
# ℹ 5 more variables: state <chr>, star_rating <dbl>, review_time <chr>,
#   review_title <chr>, review_text <chr>
glimpse(customer_data)
Rows: 10,531
Columns: 13
$ customer_id    <dbl> 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1…
$ birth_year     <dbl> 1971, 1970, 1988, 1984, 1987, 1994, 1968, 1994, 1958, 1…
$ gender         <chr> "Female", "Female", "Male", "Other", "Male", "Male", "M…
$ income         <dbl> 73000, 31000, 35000, 64000, 58000, 164000, 39000, 69000…
$ credit         <dbl> 742.0827, 749.3514, 542.2399, 573.9358, 644.2439, 553.6…
$ married        <chr> "No", "Yes", "No", "Yes", "No", "Yes", "No", "No", "No"…
$ college_degree <chr> "No", "No", "No", "Yes", "Yes", "Yes", "No", "No", "No"…
$ region         <chr> "South", "West", "South", "Midwest", "West", "Midwest",…
$ state          <chr> "DC", "WA", "AR", "MN", "HI", "MN", "MN", "KY", "NM", "…
$ star_rating    <dbl> 4, NA, NA, NA, 5, 2, NA, 5, NA, 5, NA, 5, NA, NA, NA, N…
$ review_time    <chr> "06 11, 2015", NA, NA, NA, "03 25, 2008", "06 7, 2013",…
$ review_title   <chr> "Four Stars", NA, NA, NA, "Great Product!!", "Not at al…
$ review_text    <chr> "everything's fine", NA, NA, NA, "I looked all over the…

Filter Observations

We often want to filter our data to keep certain observations using column values.

filter(customer_data, college_degree == "Yes")
# A tibble: 7,782 × 13
   customer_id birth_year gender income credit married college_degree region   
         <dbl>      <dbl> <chr>   <dbl>  <dbl> <chr>   <chr>          <chr>    
 1        1004       1984 Other   64000   574. Yes     Yes            Midwest  
 2        1005       1987 Male    58000   644. No      Yes            West     
 3        1006       1994 Male   164000   554. Yes     Yes            Midwest  
 4        1013       1974 Female 197000   680. Yes     Yes            West     
 5        1015       1995 Male   347000   658. No      Yes            West     
 6        1016       1984 Other  292000   626. Yes     Yes            Northeast
 7        1017       1997 Male    68000   588. No      Yes            Midwest  
 8        1018       1984 Male   184000   666. No      Yes            Northeast
 9        1019       1984 Female  60000   758. No      Yes            West     
10        1021       1997 Female  20000   503. No      Yes            West     
# ℹ 7,772 more rows
# ℹ 5 more variables: state <chr>, star_rating <dbl>, review_time <chr>,
#   review_title <chr>, review_text <chr>
  • We can filter by values in a column not equalling something like region != "West".
  • We can filter by more than one logical argument, as with gender == "Female", income > 70000.

Slice Observations

Sometimes we want to slice our data to keep certain observations using their position in the data.

slice(customer_data, 1:5)
# A tibble: 5 × 13
  customer_id birth_year gender income credit married college_degree region 
        <dbl>      <dbl> <chr>   <dbl>  <dbl> <chr>   <chr>          <chr>  
1        1001       1971 Female  73000   742. No      No             South  
2        1002       1970 Female  31000   749. Yes     No             West   
3        1003       1988 Male    35000   542. No      No             South  
4        1004       1984 Other   64000   574. Yes     Yes            Midwest
5        1005       1987 Male    58000   644. No      Yes            West   
# ℹ 5 more variables: state <chr>, star_rating <dbl>, review_time <chr>,
#   review_title <chr>, review_text <chr>

Arrange Observations

We can arrange observations to reveal helpful information and check data. By default this is ascending order, bu we can use the helper function desc() to modify the output.

arrange(customer_data, birth_year)
# A tibble: 10,531 × 13
   customer_id birth_year gender income credit married college_degree region   
         <dbl>      <dbl> <chr>   <dbl>  <dbl> <chr>   <chr>          <chr>    
 1        9585       1939 Female 182000   850  Yes     No             Northeast
 2        2409       1947 Male   227000   807. Yes     Yes            Midwest  
 3        4777       1947 Male   169000   651. Yes     Yes            Northeast
 4        5165       1947 Female 258000   850  No      Yes            Northeast
 5        8277       1947 Male    92000   839. Yes     No             West     
 6        8525       1947 Female 170000   723. Yes     Yes            West     
 7        9437       1947 Female  95000   850  No      Yes            West     
 8       10069       1947 Other   84000   743. Yes     Yes            West     
 9        4292       1948 Female 169000   760. Yes     No             Northeast
10        5282       1948 Male   176000   705. Yes     No             South    
# ℹ 10,521 more rows
# ℹ 5 more variables: state <chr>, star_rating <dbl>, review_time <chr>,
#   review_title <chr>, review_text <chr>

Select Variables

Sometimes we only care to keep certain variables, especially when working with a large dataset.

select(customer_data, region:review_text)
# A tibble: 10,531 × 6
   region    state star_rating review_time review_title              review_text
   <chr>     <chr>       <dbl> <chr>       <chr>                     <chr>      
 1 South     DC              4 06 11, 2015 Four Stars                everything…
 2 West      WA             NA <NA>        <NA>                      <NA>       
 3 South     AR             NA <NA>        <NA>                      <NA>       
 4 Midwest   MN             NA <NA>        <NA>                      <NA>       
 5 West      HI              5 03 25, 2008 Great Product!!           I looked a…
 6 Midwest   MN              2 06 7, 2013  Not at all flattering on… I ordered …
 7 Midwest   MN             NA <NA>        <NA>                      <NA>       
 8 South     KY              5 04 20, 2016 Super comfortable mini p… Super comf…
 9 West      NM             NA <NA>        <NA>                      <NA>       
10 Northeast VT              5 10 18, 2015 nice strong enough        Yeah       
# ℹ 10,521 more rows

There are a number of helper functions you can use with select(), including starts_with(), contains(), matches(), num_range(), all_of(), any_of(), and everything().

Mutate Variables

We can also recode existing variables or create new variables.

mutate(customer_data, income = income / 1000)
# A tibble: 10,531 × 13
   customer_id birth_year gender income credit married college_degree region   
         <dbl>      <dbl> <chr>   <dbl>  <dbl> <chr>   <chr>          <chr>    
 1        1001       1971 Female     73   742. No      No             South    
 2        1002       1970 Female     31   749. Yes     No             West     
 3        1003       1988 Male       35   542. No      No             South    
 4        1004       1984 Other      64   574. Yes     Yes            Midwest  
 5        1005       1987 Male       58   644. No      Yes            West     
 6        1006       1994 Male      164   554. Yes     Yes            Midwest  
 7        1007       1968 Male       39   608. No      No             Midwest  
 8        1008       1994 Male       69   710. No      No             South    
 9        1009       1958 Male      233   702. No      No             West     
10        1010       1994 Female     77   605. Yes     No             Northeast
# ℹ 10,521 more rows
# ℹ 5 more variables: state <chr>, star_rating <dbl>, review_time <chr>,
#   review_title <chr>, review_text <chr>

Two important things to remember:

  1. We can overwrite variables in a data frame as well as objects if we use the same name.
  2. Good variable (and object) names only use lowercase letters, numbers, and _. For example: good_name2 and 2bad name. Bad names should be renamed (but can be referenced if surrounded with ``).

Join Data Frames

A common variable (e.g., an ID) allows us to join two data frames.

store_transactions <- read_csv("store_transactions.csv")
Rows: 10531 Columns: 169
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (169): customer_id, jan_2005, feb_2005, mar_2005, apr_2005, may_2005, ju...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(store_transactions)
Rows: 10,531
Columns: 169
$ customer_id <dbl> 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010…
$ jan_2005    <dbl> 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0…
$ feb_2005    <dbl> 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 5, 0, 4, 0…
$ mar_2005    <dbl> 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 0, 5, 0…
$ apr_2005    <dbl> 4, 0, 0, 0, 0, 0, 4, 0, 0, 2, 0, 0, 0, 4, 0, 0, 0, 1, 0, 0…
$ may_2005    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ jun_2005    <dbl> 4, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0…
$ jul_2005    <dbl> 0, 0, 0, 0, 0, 0, 0, 2, 4, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 1…
$ aug_2005    <dbl> 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 2, 0, 0, 4, 0, 0, 0, 0, 0, 0…
$ sep_2005    <dbl> 0, 0, 0, 0, 0, 4, 0, 5, 4, 0, 0, 0, 0, 2, 0, 5, 0, 0, 0, 0…
$ oct_2005    <dbl> 4, 0, 0, 0, 3, 0, 5, 2, 0, 0, 0, 1, 0, 3, 0, 0, 0, 0, 0, 4…
$ nov_2005    <dbl> 0, 2, 0, 0, 4, 0, 0, 0, 0, 0, 5, 3, 0, 2, 2, 0, 2, 0, 5, 0…
$ dec_2005    <dbl> 3, 0, 1, 0, 0, 2, 1, 2, 5, 0, 0, 5, 0, 2, 4, 0, 2, 2, 3, 0…
$ jan_2006    <dbl> 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 5, 3, 0, 4, 2, 3, 0, 0, 4, 4…
$ feb_2006    <dbl> 2, 0, 0, 0, 0, 4, 0, 0, 0, 0, 5, 0, 4, 0, 0, 0, 0, 0, 0, 0…
$ mar_2006    <dbl> 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0…
$ apr_2006    <dbl> 0, 0, 0, 0, 0, 0, 4, 3, 0, 0, 0, 0, 0, 0, 0, 3, 0, 2, 5, 3…
$ may_2006    <dbl> 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 4, 3, 0, 0, 0, 0, 0, 0, 0, 0…
$ jun_2006    <dbl> 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 2, 4, 0, 5…
$ jul_2006    <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 1, 4…
$ aug_2006    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ sep_2006    <dbl> 0, 0, 0, 0, 5, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0…
$ oct_2006    <dbl> 0, 2, 1, 3, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ nov_2006    <dbl> 0, 0, 0, 4, 1, 1, 0, 0, 0, 3, 3, 3, 5, 1, 0, 0, 0, 0, 0, 0…
$ dec_2006    <dbl> 5, 0, 0, 0, 0, 1, 0, 3, 0, 0, 0, 0, 3, 5, 3, 0, 0, 0, 0, 3…
$ jan_2007    <dbl> 0, 0, 3, 0, 0, 0, 0, 0, 2, 0, 0, 0, 4, 4, 0, 0, 0, 0, 4, 3…
$ feb_2007    <dbl> 2, 0, 0, 0, 2, 0, 0, 3, 4, 0, 2, 0, 0, 0, 0, 0, 0, 0, 2, 0…
$ mar_2007    <dbl> 4, 0, 0, 0, 0, 0, 0, 0, 0, 3, 4, 0, 0, 0, 0, 0, 0, 0, 3, 0…
$ apr_2007    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 3, 0, 3, 0, 1…
$ may_2007    <dbl> 0, 0, 0, 3, 0, 0, 0, 3, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ jun_2007    <dbl> 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 1, 3, 1, 5…
$ jul_2007    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 3, 0, 0, 2, 0, 0, 0, 0, 2…
$ aug_2007    <dbl> 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ sep_2007    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 4, 3…
$ oct_2007    <dbl> 3, 3, 0, 0, 0, 3, 0, 0, 0, 0, 4, 3, 0, 0, 0, 0, 0, 5, 0, 0…
$ nov_2007    <dbl> 0, 2, 0, 5, 0, 5, 0, 0, 0, 5, 1, 0, 4, 0, 0, 2, 2, 3, 0, 3…
$ dec_2007    <dbl> 0, 4, 0, 0, 4, 0, 0, 0, 0, 0, 0, 4, 0, 1, 4, 0, 4, 0, 3, 0…
$ jan_2008    <dbl> 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 2, 0, 0, 4, 0, 0, 0, 0…
$ feb_2008    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ mar_2008    <dbl> 5, 5, 0, 0, 0, 0, 3, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ apr_2008    <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 4, 0, 0, 1, 1…
$ may_2008    <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0…
$ jun_2008    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 2, 2, 0, 0…
$ jul_2008    <dbl> 0, 0, 0, 0, 2, 2, 0, 0, 0, 1, 2, 0, 0, 4, 0, 2, 4, 0, 2, 0…
$ aug_2008    <dbl> 0, 0, 0, 0, 0, 0, 5, 3, 0, 0, 5, 0, 2, 0, 4, 0, 3, 0, 1, 0…
$ sep_2008    <dbl> 4, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 4, 0, 0, 5, 0, 0, 0, 2…
$ oct_2008    <dbl> 0, 4, 0, 0, 0, 0, 0, 0, 5, 4, 2, 0, 3, 0, 0, 3, 3, 0, 0, 4…
$ nov_2008    <dbl> 1, 0, 0, 2, 2, 4, 0, 0, 0, 0, 2, 0, 0, 0, 0, 4, 0, 2, 4, 0…
$ dec_2008    <dbl> 0, 0, 0, 0, 2, 3, 1, 0, 0, 0, 2, 2, 0, 4, 4, 0, 1, 0, 5, 1…
$ jan_2009    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0…
$ feb_2009    <dbl> 2, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0…
$ mar_2009    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0…
$ apr_2009    <dbl> 3, 0, 0, 1, 0, 0, 0, 0, 3, 0, 0, 0, 0, 4, 0, 0, 0, 0, 2, 5…
$ may_2009    <dbl> 0, 0, 0, 0, 2, 4, 0, 4, 0, 0, 0, 0, 0, 5, 4, 2, 0, 2, 1, 0…
$ jun_2009    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ jul_2009    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 2, 3, 0, 1, 5, 3…
$ aug_2009    <dbl> 0, 4, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3…
$ sep_2009    <dbl> 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 3, 2, 0, 0…
$ oct_2009    <dbl> 5, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0…
$ nov_2009    <dbl> 5, 0, 0, 0, 0, 1, 0, 2, 2, 0, 4, 2, 0, 3, 4, 0, 4, 1, 2, 4…
$ dec_2009    <dbl> 0, 0, 0, 3, 3, 0, 0, 4, 1, 4, 2, 2, 0, 0, 3, 0, 0, 0, 5, 1…
$ jan_2010    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 5…
$ feb_2010    <dbl> 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0…
$ mar_2010    <dbl> 4, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0…
$ apr_2010    <dbl> 2, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 3, 0, 2, 0…
$ may_2010    <dbl> 2, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0…
$ jun_2010    <dbl> 4, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 2, 0, 3…
$ jul_2010    <dbl> 0, 0, 4, 0, 0, 0, 5, 5, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 4, 0…
$ aug_2010    <dbl> 0, 2, 0, 0, 3, 0, 0, 0, 0, 0, 2, 0, 4, 3, 0, 0, 0, 0, 0, 0…
$ sep_2010    <dbl> 0, 0, 2, 0, 0, 0, 0, 0, 3, 0, 2, 0, 0, 0, 0, 0, 0, 4, 0, 3…
$ oct_2010    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 0, 0, 4, 0, 0, 0, 0, 2, 0…
$ nov_2010    <dbl> 0, 0, 1, 2, 3, 5, 4, 1, 0, 4, 0, 3, 5, 0, 0, 0, 0, 3, 3, 0…
$ dec_2010    <dbl> 0, 0, 5, 3, 0, 0, 5, 0, 0, 0, 0, 0, 4, 4, 0, 5, 3, 0, 0, 5…
$ jan_2011    <dbl> 2, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 5, 2, 1…
$ feb_2011    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 2, 0, 0, 0, 0…
$ mar_2011    <dbl> 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 2, 0, 2, 0, 0, 2, 0, 0, 0, 4…
$ apr_2011    <dbl> 0, 0, 2, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 3, 0…
$ may_2011    <dbl> 0, 0, 0, 0, 0, 0, 0, 2, 0, 4, 0, 0, 0, 1, 0, 0, 0, 0, 0, 4…
$ jun_2011    <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 3, 0, 0, 0, 1, 0…
$ jul_2011    <dbl> 4, 0, 0, 0, 5, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0…
$ aug_2011    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 1, 2…
$ sep_2011    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ oct_2011    <dbl> 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0…
$ nov_2011    <dbl> 5, 0, 0, 1, 0, 0, 5, 1, 2, 0, 0, 0, 4, 0, 5, 0, 0, 2, 3, 2…
$ dec_2011    <dbl> 1, 4, 0, 4, 1, 5, 0, 4, 0, 0, 2, 0, 0, 0, 0, 2, 2, 4, 3, 4…
$ jan_2012    <dbl> 3, 0, 0, 0, 0, 3, 0, 0, 4, 4, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0…
$ feb_2012    <dbl> 4, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 3, 0, 0, 5, 1, 0, 0, 0, 0…
$ mar_2012    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0…
$ apr_2012    <dbl> 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0…
$ may_2012    <dbl> 4, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 2…
$ jun_2012    <dbl> 0, 0, 0, 0, 2, 0, 0, 0, 3, 3, 0, 0, 0, 0, 3, 0, 5, 0, 2, 0…
$ jul_2012    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0…
$ aug_2012    <dbl> 0, 0, 0, 0, 0, 0, 0, 3, 2, 3, 0, 0, 0, 0, 1, 0, 0, 4, 2, 0…
$ sep_2012    <dbl> 0, 5, 1, 4, 0, 0, 0, 2, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1…
$ oct_2012    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 4, 0, 0, 0, 0, 0, 0…
$ nov_2012    <dbl> 3, 2, 0, 0, 0, 0, 0, 5, 4, 4, 2, 0, 0, 5, 0, 2, 3, 0, 1, 5…
$ dec_2012    <dbl> 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 4, 0, 4, 4, 1, 4, 0, 3, 4, 0…
$ jan_2013    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 3, 0, 0, 0, 4, 4…
$ feb_2013    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 2, 0, 0, 0, 0…
$ mar_2013    <dbl> 0, 4, 0, 3, 4, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0…
$ apr_2013    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 0, 0, 0, 3, 0, 0, 0, 4, 0, 0…
$ may_2013    <dbl> 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ jun_2013    <dbl> 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 2, 0, 3, 5, 0, 0, 0…
$ jul_2013    <dbl> 0, 0, 4, 0, 4, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 4, 0…
$ aug_2013    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ sep_2013    <dbl> 1, 0, 0, 2, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ oct_2013    <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0…
$ nov_2013    <dbl> 0, 0, 0, 0, 4, 0, 2, 1, 4, 2, 3, 0, 0, 3, 0, 0, 2, 0, 3, 3…
$ dec_2013    <dbl> 0, 4, 4, 0, 4, 2, 0, 0, 4, 0, 0, 3, 2, 3, 3, 2, 0, 3, 2, 2…
$ jan_2014    <dbl> 3, 0, 1, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0…
$ feb_2014    <dbl> 0, 2, 0, 0, 0, 1, 0, 4, 0, 5, 0, 0, 5, 1, 0, 0, 0, 3, 1, 0…
$ mar_2014    <dbl> 1, 0, 0, 0, 0, 0, 2, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 4, 2…
$ apr_2014    <dbl> 0, 3, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 3, 0, 0, 0, 2, 0, 0, 0…
$ may_2014    <dbl> 0, 3, 0, 0, 0, 0, 0, 0, 0, 4, 4, 0, 3, 2, 0, 0, 0, 0, 0, 2…
$ jun_2014    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 5, 3, 0…
$ jul_2014    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 2, 0, 0, 0, 0, 4, 0, 0…
$ aug_2014    <dbl> 2, 3, 0, 0, 0, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0…
$ sep_2014    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 1, 2, 5, 0, 1, 4, 0…
$ oct_2014    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 3, 0, 0, 0, 0, 2…
$ nov_2014    <dbl> 4, 2, 3, 4, 0, 3, 3, 0, 0, 0, 0, 0, 4, 4, 0, 4, 0, 0, 1, 3…
$ dec_2014    <dbl> 3, 0, 4, 0, 0, 1, 0, 0, 0, 3, 4, 0, 3, 0, 0, 0, 0, 2, 0, 5…
$ jan_2015    <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ feb_2015    <dbl> 0, 0, 0, 0, 4, 0, 5, 0, 0, 0, 4, 0, 0, 0, 0, 3, 0, 0, 0, 0…
$ mar_2015    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 4, 0…
$ apr_2015    <dbl> 0, 0, 0, 0, 3, 2, 0, 3, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0…
$ may_2015    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 0, 0, 0, 0, 0, 0, 4…
$ jun_2015    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 5, 4, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0…
$ jul_2015    <dbl> 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 3, 4, 5…
$ aug_2015    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 5…
$ sep_2015    <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 4, 3…
$ oct_2015    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 3, 0, 5, 0, 0, 0, 0…
$ nov_2015    <dbl> 3, 0, 0, 0, 1, 4, 0, 3, 4, 3, 0, 0, 1, 4, 0, 0, 0, 0, 4, 4…
$ dec_2015    <dbl> 0, 0, 0, 1, 0, 0, 0, 5, 0, 4, 0, 0, 0, 3, 3, 0, 0, 0, 3, 3…
$ jan_2016    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 3…
$ feb_2016    <dbl> 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 1…
$ mar_2016    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 3, 0, 1, 0, 0, 0, 3, 0…
$ apr_2016    <dbl> 3, 0, 0, 3, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 2…
$ may_2016    <dbl> 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 3…
$ jun_2016    <dbl> 0, 0, 0, 0, 0, 3, 0, 0, 0, 5, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0…
$ jul_2016    <dbl> 4, 0, 4, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 4, 3, 0, 0, 0, 0, 0…
$ aug_2016    <dbl> 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 4, 0, 0, 2, 0, 0, 3, 0…
$ sep_2016    <dbl> 0, 0, 4, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 3, 0…
$ oct_2016    <dbl> 3, 0, 0, 0, 5, 0, 0, 0, 0, 0, 5, 0, 0, 0, 4, 0, 0, 4, 1, 0…
$ nov_2016    <dbl> 5, 0, 0, 4, 2, 3, 3, 0, 0, 3, 0, 0, 0, 4, 0, 2, 0, 2, 0, 0…
$ dec_2016    <dbl> 2, 1, 0, 0, 0, 0, 1, 5, 0, 0, 0, 1, 3, 0, 4, 2, 2, 0, 2, 0…
$ jan_2017    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 4, 0…
$ feb_2017    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 0, 0, 0, 0, 0, 0…
$ mar_2017    <dbl> 0, 1, 0, 0, 0, 1, 0, 2, 3, 0, 0, 0, 0, 4, 0, 5, 0, 0, 0, 4…
$ apr_2017    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 3, 0…
$ may_2017    <dbl> 0, 0, 0, 5, 0, 0, 0, 0, 0, 1, 2, 0, 0, 2, 0, 3, 3, 0, 4, 3…
$ jun_2017    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 4, 0, 0, 2, 0, 0…
$ jul_2017    <dbl> 3, 0, 0, 1, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 3, 3, 0, 0, 0, 2…
$ aug_2017    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 2, 0, 0, 0, 1, 1, 0, 0…
$ sep_2017    <dbl> 2, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 5, 4, 0, 0, 0, 0, 0…
$ oct_2017    <dbl> 0, 0, 0, 0, 0, 2, 1, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 4, 0…
$ nov_2017    <dbl> 0, 2, 5, 0, 0, 1, 4, 0, 2, 0, 5, 0, 2, 5, 0, 0, 3, 0, 4, 0…
$ dec_2017    <dbl> 0, 0, 4, 0, 0, 0, 0, 5, 5, 5, 4, 4, 0, 0, 2, 4, 0, 3, 0, 2…
$ jan_2018    <dbl> 1, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 2, 0, 0, 0, 0, 3, 1…
$ feb_2018    <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ mar_2018    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 1, 0, 2, 0…
$ apr_2018    <dbl> 0, 0, 0, 2, 0, 0, 4, 0, 0, 0, 0, 1, 0, 0, 0, 2, 1, 4, 0, 0…
$ may_2018    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ jun_2018    <dbl> 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 2, 0, 0, 0, 0, 3…
$ jul_2018    <dbl> 0, 0, 0, 0, 0, 4, 0, 0, 0, 5, 0, 0, 0, 0, 0, 2, 0, 0, 0, 5…
$ aug_2018    <dbl> 0, 0, 0, 0, 0, 0, 4, 0, 0, 3, 0, 0, 0, 0, 0, 4, 0, 0, 0, 5…
$ sep_2018    <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 5, 0, 2, 0, 0, 0, 0, 2, 4, 0…
$ oct_2018    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ nov_2018    <dbl> 0, 0, 4, 0, 3, 0, 0, 2, 0, 4, 1, 0, 1, 0, 0, 0, 0, 0, 5, 0…
$ dec_2018    <dbl> 0, 5, 0, 0, 2, 0, 0, 4, 0, 0, 0, 4, 0, 4, 1, 0, 0, 0, 2, 0…

A left join keeps all rows in the “left” data frame plus all columns from both data frames. A left join is a good “default” join to start with.

left_join(customer_data, store_transactions, by = "customer_id")
# A tibble: 10,531 × 181
   customer_id birth_year gender income credit married college_degree region   
         <dbl>      <dbl> <chr>   <dbl>  <dbl> <chr>   <chr>          <chr>    
 1        1001       1971 Female  73000   742. No      No             South    
 2        1002       1970 Female  31000   749. Yes     No             West     
 3        1003       1988 Male    35000   542. No      No             South    
 4        1004       1984 Other   64000   574. Yes     Yes            Midwest  
 5        1005       1987 Male    58000   644. No      Yes            West     
 6        1006       1994 Male   164000   554. Yes     Yes            Midwest  
 7        1007       1968 Male    39000   608. No      No             Midwest  
 8        1008       1994 Male    69000   710. No      No             South    
 9        1009       1958 Male   233000   702. No      No             West     
10        1010       1994 Female  77000   605. Yes     No             Northeast
# ℹ 10,521 more rows
# ℹ 173 more variables: state <chr>, star_rating <dbl>, review_time <chr>,
#   review_title <chr>, review_text <chr>, jan_2005 <dbl>, feb_2005 <dbl>,
#   mar_2005 <dbl>, apr_2005 <dbl>, may_2005 <dbl>, jun_2005 <dbl>,
#   jul_2005 <dbl>, aug_2005 <dbl>, sep_2005 <dbl>, oct_2005 <dbl>,
#   nov_2005 <dbl>, dec_2005 <dbl>, jan_2006 <dbl>, feb_2006 <dbl>,
#   mar_2006 <dbl>, apr_2006 <dbl>, may_2006 <dbl>, jun_2006 <dbl>, …

An inner join keeps rows that have matching IDs along with all columns from both data frames.

inner_join(customer_data, store_transactions, by = "customer_id")
# A tibble: 10,531 × 181
   customer_id birth_year gender income credit married college_degree region   
         <dbl>      <dbl> <chr>   <dbl>  <dbl> <chr>   <chr>          <chr>    
 1        1001       1971 Female  73000   742. No      No             South    
 2        1002       1970 Female  31000   749. Yes     No             West     
 3        1003       1988 Male    35000   542. No      No             South    
 4        1004       1984 Other   64000   574. Yes     Yes            Midwest  
 5        1005       1987 Male    58000   644. No      Yes            West     
 6        1006       1994 Male   164000   554. Yes     Yes            Midwest  
 7        1007       1968 Male    39000   608. No      No             Midwest  
 8        1008       1994 Male    69000   710. No      No             South    
 9        1009       1958 Male   233000   702. No      No             West     
10        1010       1994 Female  77000   605. Yes     No             Northeast
# ℹ 10,521 more rows
# ℹ 173 more variables: state <chr>, star_rating <dbl>, review_time <chr>,
#   review_title <chr>, review_text <chr>, jan_2005 <dbl>, feb_2005 <dbl>,
#   mar_2005 <dbl>, apr_2005 <dbl>, may_2005 <dbl>, jun_2005 <dbl>,
#   jul_2005 <dbl>, aug_2005 <dbl>, sep_2005 <dbl>, oct_2005 <dbl>,
#   nov_2005 <dbl>, dec_2005 <dbl>, jan_2006 <dbl>, feb_2006 <dbl>,
#   mar_2006 <dbl>, apr_2006 <dbl>, may_2006 <dbl>, jun_2006 <dbl>, …

Consecutive Function Calls

We typically want to perform many consecutive function calls. You might be tempted to do the following. Don’t do this!

crm_data_2 <- left_join(customer_data, store_transactions, by = "customer_id")
crm_data_3 <- filter(crm_data_2, region == "West", feb_2005 == max(feb_2005))
crm_data_4 <- mutate(crm_data_3, age = 2022 - birth_year)
crm_data_5 <- select(crm_data_4, age, feb_2005)
crm_data_6 <- arrange(crm_data_5, desc(age))
crm_data_7 <- slice(crm_data_6, 1)

crm_data_7
# A tibble: 1 × 2
    age feb_2005
  <dbl>    <dbl>
1    65        5

This would be another way run the same code. Don’t do this either!

slice(
  arrange(
    select(
      mutate(
        filter(
          left_join(
            customer_data, store_transactions, by = "customer_id"), 
          region == "West", feb_2005 == max(feb_2005)
        ), 
      age = 2022 - birth_year), 
    age, feb_2005), 
  desc(age)), 
1)
# A tibble: 1 × 2
    age feb_2005
  <dbl>    <dbl>
1    65        5

Consecutive Lines of Code

Part of the common philosophy for the tidyverse is that:

  • Each function should do one specific thing well.
  • Each function should have a data frame as an input and a data frame as an output.

This allows us to |> together functions in consecutive lines of code so that it is easier for humans to read and less error-prone.

filter(customer_data, credit == 850, state == "WY")
# A tibble: 1 × 13
  customer_id birth_year gender income credit married college_degree region
        <dbl>      <dbl> <chr>   <dbl>  <dbl> <chr>   <chr>          <chr> 
1        1335       1978 Female 112000    850 No      No             West  
# ℹ 5 more variables: state <chr>, star_rating <dbl>, review_time <chr>,
#   review_title <chr>, review_text <chr>
customer_data |> 
  filter(.data = _, credit == 850, state == "WY")
# A tibble: 1 × 13
  customer_id birth_year gender income credit married college_degree region
        <dbl>      <dbl> <chr>   <dbl>  <dbl> <chr>   <chr>          <chr> 
1        1335       1978 Female 112000    850 No      No             West  
# ℹ 5 more variables: state <chr>, star_rating <dbl>, review_time <chr>,
#   review_title <chr>, review_text <chr>
  • So what does the |> do?
  • What’s the deal with the indented line after the |>?
  • How is this different from the %>%?
  • Let’s rewrite our terrible code using the |>.

Read |> as then. (If we need to use <-, we typically put it at the beginning.)

customer_data |> 
  left_join(store_transactions, by = "customer_id") |> 
  filter(region == "West", feb_2005 == max(feb_2005)) |> 
  mutate(age = 2023 - birth_year) |> 
  select(age, feb_2005) |> 
  arrange(desc(age)) |> 
  slice(1)
# A tibble: 1 × 2
    age feb_2005
  <dbl>    <dbl>
1    66        5

Quarto

What is Quarto?

Quarto is an open-source scientific and technical publishing system built on Pandoc.

You can write a report or presentation in Word, PowerPoint, etc.; then run code in an R script; then save output and visualizations; and then paste them in the report or the presentation…

…or you can do it all at once in Quarto.

The Next Generation of R Markdown

Quarto is multilingual, used with R, Python, Julia, and Observable from RStudio, VS Code, and Jupyter.

  • Natively incorporates much of the functionality developed for R Markdown
  • CLI
  • Lua filters
  • Development in terms of Lua filters

Daily Work: Commit, Push, Pull

Select the right branch first! Open up PROJECT.qmd.

Structure of a Document

  1. Content (text, figures, tables, diagrams, etc.)
  2. Code and output
  3. Metadata (format, appearance, etc.)

YAML

The header is where (most of) the metadata lives in YAML.

---
title: "Project Name"
format: gfm
---

While R Markdown can produce HTML documents, PowerPoint slides, etc., format: gfm (i.e., GitHub-flavored markdown) is what will produce the associated .md files that GitHub reads when rendered. When you make a presentation, it will be format: revealjs (see your repo’s README for details).

Content Basics

  • Use ## headings and ### sub-headings.
  • Produce bullet points using -s.
  • Identify functions and data in the text with `s, italics with *s, and bold with **s.
  • RStudio has a visual editor, but there may be some weirdness switching back and forth.
  • Pay attention to required white space.
  • Render the document to produce the specified output.

Code Cells

Code cells are language-specific.

#| echo: fenced
transcripts <- read_rds(here::here("Data", "transcripts.rds")) |> 
  tibble() |> 
  separate(doc_id, into = c("gvkey", "call_date", "title"), sep = "_")

The hashpipe operator #| allows you to specify metadata using YAML for the code cell specifically.

Divs and Spans

Divs and Spans are generic HTML/CSS elements that are used to format a document.

  • Divs are used for block-level formatting.
  • Spans are used for in-line formatting.

This is incredibly useful if you want to format an entire block of content, such as a paragraph. For example, using the .callout-warning class, I can quickly make this:

Warning

Pay attention to this information!

Milestone 1: Draft an Abstract

Using your seminar repo and week 1’s branch, modify PROJECT.qmd by drafting a paragraph-long “abstract,” a description of the kind of project you’d like to work on. A few things to consider:

  • The project will be a kind of regression.
  • Clearly identify what the outcome of interest is.
  • Specify some of the possibly explanatory variables.
  • One of the explanatory variables needs to be a kind of group variable.

Render PROJECT.qmd into PROJECT.md once you’re finished. Commit changes and push to GitHub. On GitHub, create a pull request and merge week 1’s branch into main and delete the branch when finished.

Get started on next week’s presentation by picking and reading one paper from the shared PhD seminar syllabus.